5_3_Classical_Decomposition

Author

Juan Garbayo - TS Analysis - Spring 2023

References

NOTE: this material is not original, but rather an extension of the following reference, which is publicly available.

  1. Hyndman, R.J., & Athanasopoulos, G., 2021. Forecasting: principles and practice. 3rd edition. Link

Packages

library(fpp3)

Classical decomposition:

It is the starting point for most other methods of time series decomposition. In addition, it is simple enough that it can be easily implemented from scratch, which is something we are going to do in class.

Essentially are 2 variants of classical decomposition:

  • Classical decomposition of aditive schemes
  • Classical decomposition of multiplicative schemes

The algorithm depends on whether the length seasonal period is even or uneven. Examples:

  • If we have daily data we may consider, for example, a period of one week. That seasonal period would be of length 7 (7 days).
    • NOTE: For daily data we could also consider other seasonal periods: one month, one quarter… even one year.
  • If we have monthly data, we could consider a yearly seasonal period. That seasonal period would be of length 12 (12 months).
  • If we have quarterly data, we could consider a yearly seasonal period. That seasonal period would be of length 4 (4 quarters).

Because the algorithm approximates the trend with a centered moving average, the even or uneven nature of the seasonal period is important. See the separate notebook on moving averages, which has an associated video to understand this:

  • Even seasonal period (\(m\) even): trend approximated by a 2x\(m\) moving average.
  • Uneven seasonal period \(m\): trend approximated by an \(m\) moving average.

Additive schemes - classical decomposition

Step 1 - Estimate trend

  • If \(m\) is an even number, compute the trend-cycle component \(T_t\) using 2 x \(m\)-MA.
  • If \(m\) is an odd number compute the trend-cycle component \(T_t\) using an \(m\)-MA.

Output: the trend component \(T_t\)

Step 2 - Detrended series

Calculate the de-trended series: \(D_t = y_t - T_t\)

Output: the detrended series \(D_t\)

Step 3 - Compute seasonal component

Main assumption to compute the seasonal component: the seasonal component remains constant over time. This is one of the main weaknesses of the classical decomposition.

Step 3.1. Average the de-trended values for each component of the season.

  • Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data

  • Output: a vector of length \(m\), with \(m\) being the length of the seasonal compoent. + Let us call this vector \(S_unadj\) for unadjusted seasonal component


Step 3.2. Adjust \(S_{unadj}\), the output of step 3.1 to ensure that the sum of its values add to 0.

  • 3.2.1 Sum all the components of the vector \(S_{unadj}\). Lets call this output \(a\). This result is the total deviation from 0 we want to correct.

  • 3.2.2 Divide the previous result \(a\) by the length of the seasonal period \(m\). Then subtract that amount \(a/m\) from every component of \(S_{unadj}\)

  • Output: the seasonal component \(S\).

By doing this the seasonal component will not contribute to the mean value of the time series over one period. That is, the average of the seasonal component computed in this manner over one period will be 0.

\[ \text{condition imposed: }\sum_{i=1}^{m} S_i= 0 \]

As a result, the seasonal component does not contribute to the average of the time series over one period

\[ \frac{\sum_{i=1}^my_i}{m} = \frac{\sum_{i=1}^m(T_i + S_i + R_i)}{m} \underset{\underset{\sum_{i=1}^mS_i = 0}{\uparrow}}{=} \frac{\sum_{i=1}^m(T_i + R_i)}{m} \]

Step 4 - Compute the remainder component

\(R_t = D_t - S_t = y_t - T_t - S_t\)

Example: computation using the function classical_decomposition():

# Filter the series and select relevant columns
us_retail_employment <- us_employment %>%
  filter(year(Month) >= 1990, Title == "Retail Trade") %>%
  select(-Series_ID)

# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <- us_retail_employment %>%
  
  # Fit the classical decomposition model specifying an additive scheme
  model(
    clas_deic = classical_decomposition(Employed, type = "additive")
  ) %>%
  
  # Extract components from the fitted model
  components()

select(classical_dec, Month, Employed, trend, seasonal, random)
# A tsibble: 357 x 5 [1M]
      Month Employed  trend seasonal random
      <mth>    <dbl>  <dbl>    <dbl>  <dbl>
 1 1990 Jan   13256.    NA    -75.5   NA   
 2 1990 Feb   12966.    NA   -273.    NA   
 3 1990 Mar   12938.    NA   -253.    NA   
 4 1990 Apr   13012.    NA   -190.    NA   
 5 1990 May   13108.    NA    -88.9   NA   
 6 1990 Jun   13183.    NA    -10.4   NA   
 7 1990 Jul   13170. 13178.   -13.3    5.65
 8 1990 Aug   13160. 13161.    -9.99   8.80
 9 1990 Sep   13113. 13141.   -87.4   59.9 
10 1990 Oct   13185. 13117.    34.6   33.8 
# ℹ 347 more rows
classical_dec %>% 
  autoplot()
Warning: Removed 6 rows containing missing values (`geom_line()`).

Example: manual computation implementing the described algorithm.

Step 1: compute the trend using moving averages

# Compute moving averages to estimate the trend
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- us_retail_employment %>%
  mutate(
    
    `12-MA` = slider::slide_dbl(Employed, mean,
                .before = 5, .after = 6, .complete = TRUE),
    `2x12-MA` = slider::slide_dbl(`12-MA`, mean,
                .before = 1, .after = 0, .complete = TRUE)
    
  )

# Plot the computed trend
manual_decomposition %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y = `2x12-MA`), colour = "#D55E00") +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail")
Warning: Removed 12 rows containing missing values (`geom_line()`).

Step 2 - detrend the time series

# Compute the detrended component:
manual_decomposition <- 
  manual_decomposition %>%
  mutate(
      detrended = Employed - `2x12-MA`,
    )
manual_decomposition %>% autoplot(detrended)
Warning: Removed 12 rows containing missing values (`geom_line()`).

This is the resulting detrended component, which contains the seasonal + remainder component: \(D_t = y_t - T_t = S_t + R_t\)

Step 3: compute the seasonal component

Step 3.1. Compute \(S_{unadj}\) averaging the de-trended values for each element of the season:

We will store the result in a separate dataframe called df_seasonal. This dataframe will contain 2 columns:

  1. An identifier for the number of the month. This identifier will be the same for every month, regardless of the year. More generally, we need to compute an identifier for each element of the seasonal component.
    • We will use this identifier to group the data and compute the mean value
  2. s_unadj: we compute it averaging the detrended series for each element of the seasonal component. For example, with monthly data, the seasonal component for March is the average of all the detrended March values in the data

We will start by creating the mentioned identifier for each month (regardless of the year) on our dataframe holding the decomposition:

manual_decomposition <- 
  
  manual_decomposition %>% 
  
  # Compute an identifier for each month, regardless
  # of the year. This will define the subsequent groups
  mutate(
    n_month = month(Month)
  )

manual_decomposition
# A tsibble: 357 x 7 [1M]
      Month Title        Employed `12-MA` `2x12-MA` detrended n_month
      <mth> <chr>           <dbl>   <dbl>     <dbl>     <dbl>   <dbl>
 1 1990 Jan Retail Trade   13256.     NA        NA      NA          1
 2 1990 Feb Retail Trade   12966.     NA        NA      NA          2
 3 1990 Mar Retail Trade   12938.     NA        NA      NA          3
 4 1990 Apr Retail Trade   13012.     NA        NA      NA          4
 5 1990 May Retail Trade   13108.     NA        NA      NA          5
 6 1990 Jun Retail Trade   13183.  13186.       NA      NA          6
 7 1990 Jul Retail Trade   13170.  13170.    13178.     -7.66       7
 8 1990 Aug Retail Trade   13160.  13151.    13161.     -1.20       8
 9 1990 Sep Retail Trade   13113.  13130.    13141.    -27.5        9
10 1990 Oct Retail Trade   13185.  13103.    13117.     68.5       10
# ℹ 347 more rows

Now we will use this newly created identifier to compute the averages of the detrended time series for each month regardless of the year. Expressed in more general terms, we are computing the averages of the detrended time series for each element of the seasonal component.

df_seasonal <- 
  
  manual_decomposition %>%
  
  # turn into a tibble to be able to use group_by
  # without the limitations on a tsibble
  as_tibble() %>% 
  
  # group by the created identifier
  group_by(n_month) %>%
  
  # compute the average seasonal component for every month
  summarise(
    s_unadj = mean(detrended, na.rm = TRUE)
  ) %>%
  
  ungroup() %>% 
  
  # Turn back into a tsibble
  as_tsibble(index = n_month)

df_seasonal
# A tsibble: 12 x 2 [1]
   n_month s_unadj
     <dbl>   <dbl>
 1       1   -75.5
 2       2  -273. 
 3       3  -253. 
 4       4  -190. 
 5       5   -88.9
 6       6   -10.4
 7       7   -13.3
 8       8   -10.0
 9       9   -87.4
10      10    34.6
11      11   394. 
12      12   573. 
#Check the aspect of the obtained unadjusted seasonal component
df_seasonal %>%
  
  autoplot(s_unadj) +
  
  scale_x_continuous(breaks = seq(1, 12),
                     minor_breaks = seq(1, 12)) +
  
  geom_point(color = "red")

This is the resulting unadjusted seasonal component \(S_{unadj}\):

  • The first point is the unadjusted value of the seasonal component for January.
  • The second point is the unadjusted value of the seasonal component for February.
  • The third point is the unadjusted value of the seasonal component for March.

Step 3.2 Adjust the seasonal component

Let us check if the values of the component add up to 0 as we wanted to impose:

#The sum does not add up to 0
a = sum(df_seasonal$s_unadj)
a
[1] -0.2273091

We can see that the sum is not 0. We will divide the sum by the length of the seasonal component (in this case \(m\) = 12) and then distribute this deviation from 0 evenly over the seasonal component.

df_seasonal <- 
  
  df_seasonal %>%
  
  # Correction so that the sum of the seasoan components is 0
  mutate(
        seasonal_c = s_unadj  - a/12
    ) %>% 
  
  # Drop s_unadj since we do not need it anymore.
  select(n_month, seasonal_c)

sum(df_seasonal$seasonal_c)
[1] 1.136868e-13

As you can see, the seasonal component is now 0. Great. Let us bring the seasonal component back into our original dataset. For this we will use a join operation. Remember that we created previously a column called n_month that now will come in handy when doing this join:

# Bring seasonal component into the dataset (I chose to use a "join" statement, very convenient)
manual_decomposition <- 
  left_join(manual_decomposition, df_seasonal, by = "n_month")

manual_decomposition
# A tsibble: 357 x 8 [1M]
      Month Title        Employed `12-MA` `2x12-MA` detrended n_month seasonal_c
      <mth> <chr>           <dbl>   <dbl>     <dbl>     <dbl>   <dbl>      <dbl>
 1 1990 Jan Retail Trade   13256.     NA        NA      NA          1     -75.5 
 2 1990 Feb Retail Trade   12966.     NA        NA      NA          2    -273.  
 3 1990 Mar Retail Trade   12938.     NA        NA      NA          3    -253.  
 4 1990 Apr Retail Trade   13012.     NA        NA      NA          4    -190.  
 5 1990 May Retail Trade   13108.     NA        NA      NA          5     -88.9 
 6 1990 Jun Retail Trade   13183.  13186.       NA      NA          6     -10.4 
 7 1990 Jul Retail Trade   13170.  13170.    13178.     -7.66       7     -13.3 
 8 1990 Aug Retail Trade   13160.  13151.    13161.     -1.20       8      -9.99
 9 1990 Sep Retail Trade   13113.  13130.    13141.    -27.5        9     -87.4 
10 1990 Oct Retail Trade   13185.  13103.    13117.     68.5       10      34.6 
# ℹ 347 more rows

Step 4: compute the remainder component removing the seasonal component from the detrended series

Now that we have the seasonal component, we obtain the remainder component by subtracting it from the de-trended series.

# Compute remainder component:
manual_decomposition <- 
  manual_decomposition %>%
  mutate(
        remainder = detrended - seasonal_c
      )

select(manual_decomposition, Month, `2x12-MA`, seasonal_c, remainder)
# A tsibble: 357 x 4 [1M]
      Month `2x12-MA` seasonal_c remainder
      <mth>     <dbl>      <dbl>     <dbl>
 1 1990 Jan       NA      -75.5      NA   
 2 1990 Feb       NA     -273.       NA   
 3 1990 Mar       NA     -253.       NA   
 4 1990 Apr       NA     -190.       NA   
 5 1990 May       NA      -88.9      NA   
 6 1990 Jun       NA      -10.4      NA   
 7 1990 Jul    13178.     -13.3       5.65
 8 1990 Aug    13161.      -9.99      8.80
 9 1990 Sep    13141.     -87.4      59.9 
10 1990 Oct    13117.      34.6      33.8 
# ℹ 347 more rows

We have all the components. We are now going to compute the maximum absolute errors of our manual computation, comparing it with the result provided by the function classical_decomposition()

# Evaluation of absolute errors
max_err_seasonal = max(classical_dec$seasonal - manual_decomposition$seasonal_c, na.rm = TRUE)
max_err_trend = max(classical_dec$trend - manual_decomposition$`2x12-MA`, na.rm = TRUE)
max_err_rand = max(classical_dec$random - manual_decomposition$remainder, na.rm = TRUE)

# Compute the max value of all these vectors
max(c(max_err_seasonal, max_err_trend, max_err_rand))
[1] 6.025402e-12

QUESTION Can you provide an interpretation of this result?

The result is not 0 but it is sufficiently close to 0 that we can attribute this deviation to numerical errors (particularly given the fact that this error is much smaller than any of the components we want to compute).

Extra: generate a plot of the components we just created

# As an additional ggplot visualization exercise, below the code to visualize
# the decomposition we have computed manually

# Define levels for the factor variable to ensure proper ordering
# in the facet grid figure.
factor_levels = c("Employed", "trend", "seasonal_c", "remainder")

manual_decomposition_long <- 
  
  manual_decomposition %>% 

    # Rename so that the labels are consistent
    rename(trend = `2x12-MA`) %>% 
  
    # Select the columns of interest
    select(Month, Employed, trend, seasonal_c, remainder) %>% 
  
    # Generate long format based on a single variable so that we can
    # facet_grid following this variable
    pivot_longer(cols = Employed:remainder, names_to = "component") %>% 
    
    # Convert to factor to ensure proper ordering in the figure
    mutate(
      component = factor(component, levels=factor_levels)
    )

# Now generate the graph
manual_decomposition_long %>% 
  
  ggplot(aes(x = Month, y = value)) +
  
  geom_line() +
  
  # scales = "free_y" so that each component has an adapted y-axis range
  facet_grid(component ~ ., scales = "free_y")
Warning: Removed 6 rows containing missing values (`geom_line()`).

We can see that it looks just like the decomposition returned by the fable library.

Multiplicative scheme - classical decomposition

In this case substractions are simply replaced by divisions

Step 1 - Estimate trend

If \(m\) is an even number, compute the trend-cycle component \(\hat{T_t}\) using 2 x \(m\)-MA. If \(m\) is an odd number compute the trend-cycle component \(\hat{T_t}\) using an \(m\)-MA.

Step 2 - Detrended series

Calculate the de-trended series: \(y_t / \hat{T_t}\)

Step 3 - Compute seasonal component

We will make the underlying assumption that the seasonal component remains the same for every season. This is one of the main weaknesses of the classical decomposition.

  1. Average the de-trended values for each component of the season.

    • Example: with monthly data the seasonal component for march is the average of all the detrended March values in the data


  1. Adjust the seasonal component values to ensure they add to \(m\). This ensures that the average of the seasonal component over one period is 1.

    2.1 Sum the seasonal components of everymonth. The result is the total deviation from 0 of the sum of the seasonal components.

    2.2 Divide the previous result by the seasonal period and substract that amount from each seasonal component.

Since the seasonal components add to \(m\), the average of the time series over one period is a weighted average of the product of the trend and the remainder components, with the seasonal components divided by \(m\) as weights that add to 1.In this case the weighted average is not symmetrical. If \(m\) is the length of the seasonal period we have:

\[ \sum_{i=1}^m{S_i} = m \]

\[ \frac{\sum_{i=1}^m(T_i S_i R_i)}{m} = \underbrace{\sum_{i=1}^m\frac{Si}{m}(T_i R_i) }_{weighted\;mov.\;av.\\ of\;R_iT_i\;since\sum_{i=1}^mS_i/m = 1} \]

Step 4 - Compute the remainder component

\(\hat{R} = y_t /\hat{T_t}\hat{S_t}\)

Comments on classical decomposition:

Classical decomposition is still widely used, but it is not recommended. It is however the foundation for many of the other methods and understanding it is important. Some problems with classical decomposition are summarised below:

  • Because the trend-cycle is approximated using centered moving averages, there is no estimate of the trend-cycle or the remainder component for the few first and last observations.
  • The trend-cycle estimate tends to over-smooth rapid rises and falls in the data.
  • Assumes that the seasonal component repeats from year to year.
    • For many series this is reasonable
    • For other longer series it is not. For example, electricity demand patterns have changed over time as air conditioning has become more available. This has shifted the maximum of the seasonal component from winter to summer.
  • The classical decomposition method is not robust enough against unusual values or outliers in the series.

Seasonally adjusted data:

The seasonally adjusted data is the reslt of removing the seasonal component from the original data

  • Additive decomposition: \(y_t - St\)
  • Multiplicative decomposition: \(y_t / S_t\)

It contains **the rmainder as well as the trend-cucle component*. Therefore they are not “smooth”. “Downturns” or “upturms” can be misleading.

  • If you want to look for turning poins and interpret changes in direction, it is best to use the trend cycle component rather than the seasonally adjusted data

The seasonally adjusted data is useful if the variation due to seasonality is not of primary interest.

  • Example: unemployment data are usually seasonally adjusted in order to highlight variation due to the underlying state of the economy rather than the seasonal variation
classical_dec %>%
  as_tsibble() %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Persons (thousands)",
       title = "Total employment in US retail")

Exercise 1

Consider the last five years of the Gas data from aus_production

gas <- tail(aus_production, 5*4) %>% select(Gas)
gas
# A tsibble: 20 x 2 [1Q]
     Gas Quarter
   <dbl>   <qtr>
 1   221 2005 Q3
 2   180 2005 Q4
 3   171 2006 Q1
 4   224 2006 Q2
 5   233 2006 Q3
 6   192 2006 Q4
 7   187 2007 Q1
 8   234 2007 Q2
 9   245 2007 Q3
10   205 2007 Q4
11   194 2008 Q1
12   229 2008 Q2
13   249 2008 Q3
14   203 2008 Q4
15   196 2009 Q1
16   238 2009 Q2
17   252 2009 Q3
18   210 2009 Q4
19   205 2010 Q1
20   236 2010 Q2
  1. Use classical_decomposition with type=multiplicative to calculate the trend-cycle and seasonal indices
# Let us compute the classical decomposition using the function in the feasts library:
classical_dec <- gas %>%
  model(
    classical_decomposition(Gas ~ season(4), type = "multiplicative")
  ) %>%
  components()

classical_dec
# A dable: 20 x 7 [1Q]
# Key:     .model [1]
# :        Gas = trend * seasonal * random
   .model                      Quarter   Gas trend seasonal random season_adjust
   <chr>                         <qtr> <dbl> <dbl>    <dbl>  <dbl>         <dbl>
 1 "classical_decomposition(G… 2005 Q3   221   NA     1.13  NA              196.
 2 "classical_decomposition(G… 2005 Q4   180   NA     0.925 NA              195.
 3 "classical_decomposition(G… 2006 Q1   171  200.    0.875  0.974          195.
 4 "classical_decomposition(G… 2006 Q2   224  204.    1.07   1.02           209.
 5 "classical_decomposition(G… 2006 Q3   233  207     1.13   1.00           207.
 6 "classical_decomposition(G… 2006 Q4   192  210.    0.925  0.987          208.
 7 "classical_decomposition(G… 2007 Q1   187  213     0.875  1.00           214.
 8 "classical_decomposition(G… 2007 Q2   234  216.    1.07   1.01           218.
 9 "classical_decomposition(G… 2007 Q3   245  219.    1.13   0.996          218.
10 "classical_decomposition(G… 2007 Q4   205  219.    0.925  1.01           222.
11 "classical_decomposition(G… 2008 Q1   194  219.    0.875  1.01           222.
12 "classical_decomposition(G… 2008 Q2   229  219     1.07   0.974          213.
13 "classical_decomposition(G… 2008 Q3   249  219     1.13   1.01           221.
14 "classical_decomposition(G… 2008 Q4   203  220.    0.925  0.996          219.
15 "classical_decomposition(G… 2009 Q1   196  222.    0.875  1.01           224.
16 "classical_decomposition(G… 2009 Q2   238  223.    1.07   0.993          222.
17 "classical_decomposition(G… 2009 Q3   252  225.    1.13   0.994          224.
18 "classical_decomposition(G… 2009 Q4   210  226     0.925  1.00           227.
19 "classical_decomposition(G… 2010 Q1   205   NA     0.875 NA              234.
20 "classical_decomposition(G… 2010 Q2   236   NA     1.07  NA              220.
  1. Compute and plot the seasonally adjusted data.
classical_dec %>%
  
  as_tsibble() %>%
  
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y=season_adjust), colour = "#0072B2") +
  labs(y = "Gas production (Petajoules)",
       title = "Australian gas production")

  1. Compute manually the classical decomposition of the series (as we did in class, but keep in mind that this scheme is multiplicative and that now the seasonal period is 4 instead of 12).
### 1. Compute the trend with a 2x4 MA

# Compute moving averages
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- gas %>%
  mutate(
    `4-MA` = slider::slide_dbl(Gas, mean,
                     .before = 1, .after = 2, .complete = TRUE),
    `2x4-MA` = slider::slide_dbl(`4-MA`, mean,
                      .before = 1, .after = 0, .complete = TRUE)
  )

# Plot the computed trend
manual_decomposition %>%
  autoplot(Gas, colour = "gray") +
  geom_line(aes(y = `2x4-MA`), colour = "#D55E00") +
  labs(y = "Gas production (Petajoules)",
       title = "Australian gas production")
Warning: Removed 4 rows containing missing values (`geom_line()`).

### 2. Compute the de-trended component:

# Compute the detrended component:
manual_decomposition <- manual_decomposition %>%
  mutate(detrended = Gas / `2x4-MA`,
         n_quarter = quarter(Quarter)) # Add an identifier for the month (used later for a join statement)

manual_decomposition %>% autoplot(detrended)
Warning: Removed 4 rows containing missing values (`geom_line()`).

### 3. Compute the seasonal component grouping and averaging the detrended component

res <- manual_decomposition %>%
  index_by(n_quarter) %>%
  
  #Compute seasonal component
  summarise(
    seasonal_c = mean(detrended, na.rm = TRUE)
  ) 

res
# A tsibble: 4 x 2 [1]
  n_quarter seasonal_c
      <int>      <dbl>
1         1      0.875
2         2      1.07 
3         3      1.13 
4         4      0.925
#Check the aspect of the obtained seasonal component
res %>%
  autoplot(seasonal_c)

### 4. Correct the seasonal components so that they add to m = 4

m = 4
sum_dev = m - sum(res$seasonal_c)

res <- res %>%
  # Correction so that the sum of the seasoan components is 0
  mutate(seasonal_c = seasonal_c  + sum(sum_dev)/m)

sum(res$seasonal_c)
[1] 4
#Check the aspect of the obtained seasonal component
res %>%
  autoplot(seasonal_c)

# Bring seasonal component into the dataset 
# (I chose to use a "join" statement, very convenient)
manual_decomposition <- left_join(manual_decomposition, res, by = "n_quarter")
manual_decomposition
# A tsibble: 20 x 7 [1Q]
     Gas Quarter `4-MA` `2x4-MA` detrended n_quarter seasonal_c
   <dbl>   <qtr>  <dbl>    <dbl>     <dbl>     <int>      <dbl>
 1   221 2005 Q3    NA       NA     NA             3      1.13 
 2   180 2005 Q4   199       NA     NA             4      0.925
 3   171 2006 Q1   202      200.     0.853         1      0.875
 4   224 2006 Q2   205      204.     1.10          2      1.07 
 5   233 2006 Q3   209      207      1.13          3      1.13 
 6   192 2006 Q4   212.     210.     0.913         4      0.925
 7   187 2007 Q1   214.     213      0.878         1      0.875
 8   234 2007 Q2   218.     216.     1.08          2      1.07 
 9   245 2007 Q3   220.     219.     1.12          3      1.13 
10   205 2007 Q4   218.     219.     0.937         4      0.925
11   194 2008 Q1   219.     219.     0.887         1      0.875
12   229 2008 Q2   219.     219      1.05          2      1.07 
13   249 2008 Q3   219.     219      1.14          3      1.13 
14   203 2008 Q4   222.     220.     0.921         4      0.925
15   196 2009 Q1   222.     222.     0.883         1      0.875
16   238 2009 Q2   224      223.     1.07          2      1.07 
17   252 2009 Q3   226.     225.     1.12          3      1.13 
18   210 2009 Q4   226.     226      0.929         4      0.925
19   205 2010 Q1    NA       NA     NA             1      0.875
20   236 2010 Q2    NA       NA     NA             2      1.07 
# Compute remainder component:
manual_decomposition <- manual_decomposition %>%
  mutate(remainder = detrended / seasonal_c)

select(manual_decomposition, Quarter, Gas, `2x4-MA`, seasonal_c, remainder)
# A tsibble: 20 x 5 [1Q]
   Quarter   Gas `2x4-MA` seasonal_c remainder
     <qtr> <dbl>    <dbl>      <dbl>     <dbl>
 1 2005 Q3   221      NA       1.13     NA    
 2 2005 Q4   180      NA       0.925    NA    
 3 2006 Q1   171     200.      0.875     0.974
 4 2006 Q2   224     204.      1.07      1.02 
 5 2006 Q3   233     207       1.13      1.00 
 6 2006 Q4   192     210.      0.925     0.987
 7 2007 Q1   187     213       0.875     1.00 
 8 2007 Q2   234     216.      1.07      1.01 
 9 2007 Q3   245     219.      1.13      0.996
10 2007 Q4   205     219.      0.925     1.01 
11 2008 Q1   194     219.      0.875     1.01 
12 2008 Q2   229     219       1.07      0.974
13 2008 Q3   249     219       1.13      1.01 
14 2008 Q4   203     220.      0.925     0.996
15 2009 Q1   196     222.      0.875     1.01 
16 2009 Q2   238     223.      1.07      0.993
17 2009 Q3   252     225.      1.13      0.994
18 2009 Q4   210     226       0.925     1.00 
19 2010 Q1   205      NA       0.875    NA    
20 2010 Q2   236      NA       1.07     NA    
# Evaluation of absolute errors
max_err_seasonal = max(classical_dec$seasonal - manual_decomposition$seasonal_c, na.rm = TRUE)
max_err_trend = max(classical_dec$trend - manual_decomposition$`2x4-MA`, na.rm = TRUE)
max_err_rand = max(classical_dec$random - manual_decomposition$remainder, na.rm = TRUE)

max(c(max_err_seasonal, max_err_trend, max_err_rand))
[1] 3.720274e-06
The errors are small enough for us to consider that we have correctly 
implemented the algorithm in the function `classical_decomposition()`.

It would be interesting to explore the specific reasons for this deviation, 
but that is something we will not do.
  1. Change one observations to be an outlier (e.g., add 300 to one observation), and recompute the seasonally adjusted data. What is the effect of the outlier?
gas %>%
  mutate(Gas = if_else(Quarter == yearquarter("2007Q4"), Gas + 300, Gas)) %>%
  model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally adjusted data", y = "Petajoules")

* The outlier is more or less in the middle of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is in the middle 
of the series, this information is in the region where the moving average can 
be computed and the information of the outlier is captured by the trend component.
* The detrended component also inherits the information of the outlier, 
since it is computed as $y_t/T_t$. 
* The seasonal component is computed averaging this detrended component and 
therefore is also affected by the outlier. It will not accurately approximate 
the seasonal component.
  1. Does it make any difference if the outlier is near the end rather than in the middle of the series?
As a result the seasonally adjusted data still has seasonality, because we have 
not adequately captured the seasonal component.
  1. Does it make any difference if the outlier is near the end rather than in the middle of the series?
gas %>%
  mutate(Gas = if_else(Quarter == yearquarter("2010Q2"), Gas + 300, Gas)) %>%
  model(decomp = classical_decomposition(Gas, type = "multiplicative")) %>%
  components() %>%
  as_tsibble() %>%
  autoplot(season_adjust) +
  labs(title = "Seasonally adjusted data", y = "Petajoules")
* The outlier is located at the last point of the series.
* The trend is approximated using a 2x4-MA. Since the outlier is at the very 
end of the series, this information is in the region where the 2x4 moving 
average cannot be computed and the information of the outlier is not captured 
by the trend component.
* The detrended component is defined in exactly the same region as the trend 
approximation, since it is computed as $y_t/T_t$. Therefore the detrended 
component does not capture the information about the outlier.
* The seasonal component is computed averaging this detrended component. 
Therefore it does not inherit information from the outlier. The seasonal 
component is not affected by the presence of this outlier.

Exercise 2

  1. Take the dataset vic_elec and aggregate the data to obtain the daily demand using index_by(). The command below computes the classical decomposition considering a period of 1 week (instead of 1 year) having filtered for the year 2012.
vic_elec_w <-
  
  vic_elec %>%
  
    index_by(Date) %>%
  
    summarize(
          avg_demand = mean(Demand, na.rm = TRUE)
      ) %>%
  
    filter(year(Date) == 2012)

vic_elec_w_dcmp <-
  
  vic_elec_w %>%
  
      model(
        decomp = classical_decomposition(avg_demand ~ season(period = 7), 
                                         type = "additive")
        ) %>%
  
      
      components() 


vic_elec_w_dcmp %>% autoplot()
Warning: Removed 3 rows containing missing values (`geom_line()`).

Compute manually this decomposition (no need to depict it, just compute the values within vic_elec_w_dcmp manually).

# Compute moving averages to estimate the trend
# Because the seasonal period is 12, we need a 2x12-MA
manual_decomposition <- vic_elec_w %>%
  mutate(
    `7-MA` = slider::slide_dbl(avg_demand, mean,
                .before = 3, .after = 3, .complete = TRUE)
  )

# Plot the computed trend
manual_decomposition %>%
  autoplot(avg_demand, colour = "gray") +
  geom_line(aes(y = `7-MA`), colour = "#D55E00") +
  labs(y = "average demand (MWh)",
       title = "Average weekly electricity demand")
Warning: Removed 6 rows containing missing values (`geom_line()`).

Let us do a first sanity check and compare the trend computed by the function
and the resulting of our moving average
trend_func <- vic_elec_w_dcmp$trend
trend_man <- manual_decomposition$`7-MA`

# Absolute errors at each point in time of the trend
abs_err <- abs(vic_elec_w_dcmp$trend - manual_decomposition$`7-MA`)

# Biggest numerical difference
# of order 1E-12 -> numerical noise, look at the order of magnitude
max(abs_err, na.rm = TRUE)
[1] 1.818989e-12
# Lets round to the 3rd decimal (accuracy of up to KWh) and then
# check equality
# This sum equals 0, which means that they are the same at least down
# to the third decimal (give it some thought)
sum(!(round(trend_func, 3) == round(trend_man, 3)))
[1] NA
# Alternative way to check this in R.
# This returns TRUE, meaning they are the same at least
# down to the third decimal
identical(round(trend_func, 3) , round(trend_man, 3))
[1] TRUE
# Compute the detrended component:
manual_decomposition <- manual_decomposition %>%
  mutate(
        # Compute detrended component
        detrended = avg_demand - `7-MA`,
        
        # Identifier added for the day of the week
        wday = wday(Date, label = FALSE, abbr = FALSE)
      )


# Plot detrended component (additive + seasonal)
manual_decomposition %>% autoplot(detrended)
Warning: Removed 6 rows containing missing values (`geom_line()`).

# Include the seasonal component

res <- manual_decomposition %>%
  index_by(wday) %>%
  
  #Compute seasonal component
  summarise(
    seasonal_c = mean(detrended, na.rm = TRUE)
  ) %>%
  
  ungroup()

res
# A tsibble: 7 x 2 [1]
   wday seasonal_c
  <dbl>      <dbl>
1     1      -548.
2     2       121.
3     3       189.
4     4       203.
5     5       233.
6     6       160.
7     7      -380.
#Check the aspect of the obtained seasonal component
res %>%
  autoplot(seasonal_c) +
  scale_x_continuous(breaks = seq(1, 7),
                     minor_breaks = seq(1, 7)) +
  geom_point(color = "red")

Now we make the necessary correction so that all the sum of the seasonal
component over an entire season = 0
corr <- sum(res$seasonal_c) / 7

res <- 
  res %>% 
  mutate(
    seasonal_c = seasonal_c - corr
  )
# Bring back our seasonal component to the time series
manual_decomposition <-
  manual_decomposition %>% 
    left_join(res, by = "wday")
Let us introduce a new sanity check and check that our seasonal component
matches the component from the function `classical_decomposition()`
seasonal_func <- round(vic_elec_w_dcmp$seasonal, 3)
seasonal_man <-  round(manual_decomposition$seasonal_c, 3)

# This sum is 0, meaning that both vectors coincide at least down to
# 3 decimals. Give this some thought to understand it.
sum(!(seasonal_func == seasonal_man))
[1] 0
# Max absolute error of order 1E-14
max(vic_elec_w_dcmp$seasonal - manual_decomposition$seasonal_c)
[1] 5.684342e-14
# Finally, compute the remainder component
manual_decomposition <- 
  manual_decomposition %>% 
  mutate(
    remainder = detrended - seasonal_c
  )
# Last check - compare both remainders (a.k.a random)
remainder_func <- round(vic_elec_w_dcmp$random, 3)
remainder_man <-  round(manual_decomposition$remainder, 3)

# This sum is 0, meaning that both vectors coincide at least down to
# 3 decimals. Give this some thought to understand it.
sum(!(remainder_func[!is.na(remainder_func)] == remainder_man[!is.na(remainder_man)]))
[1] 0
# As an additional ggplot visualization exercise, below the code to visualize
# the decomposition we have computed manually

# Define levels for the factor variable to ensure proper ordering
# in the facet grid figure.
factor_levels = c("avg_demand", "trend", "seasonal_c", "remainder")

manual_decomposition_long <- 
  
  manual_decomposition %>% 

    # Rename so that the labels are consistent
    rename(trend = `7-MA`) %>% 
  
    # Select the columns of interest
    select(Date, avg_demand, trend, seasonal_c, remainder) %>% 
  
    # Generate long format based on a single variable so that we can
    # facet_grid following this variable
    pivot_longer(cols = avg_demand:remainder, names_to = "component") %>% 
    
    # Convert to factor to ensure proper ordering in the figure
    mutate(
      component = factor(component, levels=factor_levels)
    )

# Now generate the graph
manual_decomposition_long %>% 
  
  ggplot(aes(x = Date, y = value)) +
  
  geom_line() +
  
  # scales = "free_y" so that each component has an adapted y-axis range
  facet_grid(component ~ ., scales = "free_y")
Warning: Removed 3 rows containing missing values (`geom_line()`).